Figure 5.13: Prediction Limit tab of the Configure Sanitas window (Options/Configure Sanitas) (EPA Standards)
Figure 5.14: Prediction Limit tab of the Configure Sanitas window (Options/Configure Sanitas) (UG Standards)
Test for Normality By default, the background data are tested for normality prior to construction of a parametric prediction limit, using the Shapiro-Wilk/Francia normality test. If the normality test fails, a non-parametric prediction limit will be substituted. This default behavior can be modified to use other normality tests or alpha levels, or to skip the normality test entirely. Note that "Alpha based on n" (used in various places in the program) varies the alpha level based on the sample size: if n < 10, alpha=0.1; if 10<=n < 20, alpha=0.05; if n >= 20, alpha=0.01.
Use Non-Parametric Test when Non-Detects Percent > __ The value entered in this text box is the cutoff point for the percentage of censored values (values beginning with a less than sign) in background, beyond which the non-parametric prediction limit will be used.
Use ___ when Non-Detects Percent > __ Sanitas will optionally use the selected non-detect adjustment when the percentage of censored values in background exceeds the user-specified value.
Optional Further Refinement: Use ___ when Non-Detects Percent > __ When this checkbox is checked, Sanitas will use the selected non-detect adjustment when the percentage of censored values in background exceeds the user-specified value. In this way, Sanitas can be configured to use, for example, Cohen’s Adjustment when non-detects are between 15 and 50%, and Aitchison’s Adjustment when non-detects exceed 50%.
Use Poisson Prediction Limit when Non-Detects Percent > __ Sanitas will optionally substitute a Poisson Prediction Limit when the percentage of censored values in background exceeds the user-specified value. This option takes priority over Use ___ when Non-Detects Percent > __ if both are set to the same value.
Deseasonalize…: The data set (including both the background and compliance data as a group) can be deseasonalized, either based on the results of the seasonality test described elsewhere in this document, or regardless. Because the seasonality test requires more data than is needed to deseasonalize, the option also exists to deseasonalize when the sample size is insufficient to test (but sufficient to deseasonalize).
Always Use Non-Parametric: When this option is checked, selecting a Prediction Limit from the Analysis menu is the equivalent of selecting a Non-Parametric Prediction Limit.
Facility α (UG Standards): Click to see the alpha levels that were calculated based on the facility parameters entered.
Statistical Evaluations Per Year (UG Standards): Enter the number of times per year that statistical evaluations are run, e.g. for quarterly analysis enter 4. This value will be used to partition the annual sitewide false positive rate (fixed in the UG tables and in Sanitas at 0.1) to obtain the per-evaluation sitewide false positive rate.
Constituents Analyzed (UG Standards): Enter the number of distinct parameters that will be analyzed. This value will be used to partition the per-evaluation sitewide false positive rate to obtain the per-constituent false positive rate.
Downgradient (Compliance) Wells(UG Standards): Enter the number of downgradient wells that will be included in the analysis. This value will be used to partition the per-constituent false positive rate to obtain the false positive rate for intrawell reports.
Enter the number of statistical evaluation periods per year (nE), number of constituents (c), and number of monitoring wells (w). The annual target facility-wide false positive rate should be no greater than 10% (cumulative throughout the year). If a facility samples semi-annually, for instance, the overall target rate is distributed evenly among each sampling event, giving each sampling event a 5% target rate (a = .10/2 = .05 = 5%). The individual test alpha (a*) then equals the targeted per-event false positive rate divided by the total number of statistical tests (r).
For example, a site which samples semi-annually for 15 constituents at 7 wells would have the following per-test alpha levels:
Semi-annual target rate:a = .10/2 = .05 = 5%
Total # of tests:r = c ● w = 15 x 7 = 105
Per-test alpha level:a* = a/r = .05/105 = .0004
Sampling Plan (UG Standards): Select the resampling and/or means/medians plan for your facility. Some combinations are not available, for example there is no “1 of 1” plan for “individual observations” defined in the Unified Guidance (this specific configuration is available under EPA, CA or ASTM Standards).
Per-report alpha levels under Unified Guidance standards are calculated based on facility parameters. If instead you would like to enter a specific number of reports to be run, you may do so as follows. First, enter 1 in the Statistical Evaluations per Year field. For interwell, enter the number of reports in the Constituents Analyzed field, and enter 1 for Compliance (Downgradient) wells. For intrawell, enter the number of reports in the Compliance (Downgradient) wells field, and enter 1 for Constituents Analyzed.
Complete the site configuration by specifying whether prediction limits will be constructed based on future observations, means of order 2, or means of order 3. If prediction limits will be constructed for future observations, a resample program must be selected (1 of 2, 1 of 3, 1 of 4, or 2 of 4 Modified CA Plan). When the initial sample exceeds its predicted limit, the appropriate number of resamples is taken to verify the initial finding. If the resample is within its predicted limit, the initial exceedance is considered a false finding and the resample will replace the exceeded value for any future statistical analyses.
The first number in each of the plans indicates how many resamples must pass the predicted limit in order to declare the initial exceedance a false finding. The second number indicates the “total” number of samples required (i.e. the initial sample plus all resamples). For instance, the 1 of 3 plan means that when an initial exceedance is noted, two resamples are required and one of them must pass the limit in order to declare the initial exceedance a false finding.
User-Set k (EPA/ASTM/CA): k is the number of compliance points to be plotted against the limit, and is used to construct the limit with a pre-specified alpha level. In most cases one point per compliance well (or per report in intrawell analyses) is compared to the limit, and this behavior is realized by entering 1 in the Compliance Observations Per Well field. The Compliance Wells (for Interwell) field is, as the caption implies, only used in Interwell analyses. Leaving either field blank will cause Sanitas to use the actual number of points per well or compliance wells, respectively. In other words, if both fields are left blank all compliance data points will be compared to the limit.
This option allows you to adjust the number of future comparisons that will be used in the calculation of Prediction Limits. Sanitas Technologies does not recommend that you change the default “K” value, except possibly in the case where limits are constructed with no compliance data and need to account for the total number of comparisons for an entire year. Please note that increasing the default K value will result in higher limits as well as increased false positive and/or false negative rates.
In the case of Parametric Intrawell Prediction Limits, Sanitas by default assumes the user is constructing a limit to contain one future value. When more than one compliance value is selected to be compared against the limit, the overall alpha (which is a result of the individual alpha multiplied by the total number of comparisons) increases.
Alternatively, if you wish to distribute the overall default alpha evenly among the number of future comparisons (up to four), effectively changing the “K” value and creating higher limits, you may do so by modifying the alpha level. The overall alpha should be divided by the number of future comparisons per well. For example, if you sample quarterly and you wish to construct a limit assuming four future samples, you could modify the default alpha, which is currently .01 in EPA standards to .0025 (.01 divided by 4). The overall alpha would remain constant, but the individual alpha decreases.
Another scenario where you may wish to effectively change the “K” value would be if you performed statistics once each year, and compared all of the sampled values from a particular well (quarterly or semi-annual sampling) to the limit. In this case, you have the option of modifying the alpha prior to performing the Intrawell Prediction Limit (i.e. the desired overall alpha divided by the number of compliance points). For example, if you sample semi-annually and compare the two values simultaneously to a limit, you could modify the default alpha of .01 to .005 (i.e. .01 divided by 2). This would maintain a lower testwise false positive rate.
In the case of Parametric Interwell Prediction Limits, you may specify the number of compliance points from each well, as well as the number of wells anticipated to be included in the analysis. When the Prediction Limit is constructed, if the number of wells selected for the analysis is greater than the number of wells specified in the setup window, Sanitas will construct the limit based on the actual number of wells selected, and the overall alpha level will be calculated accordingly.
In the case of Non-Parametric Intrawell and Interwell Prediction Limits, the alpha level is determined by the number of samples in the background data, which is used to construct the limit. Changing the number of future comparisons will affect the overall false positive rate which is determined by multiplying the number of comparisons selected per well times the number of wells.
Again, leave either field blank to use the actual number plotted, i.e. the number (of compliance points and/or wells as the case may be) in the selected data set.
Use Modified Alpha (EPA/ASTM/CA): The alpha level, or false positive rate, used in parametric prediction limits can be adjusted by use of this option. This feature should generally only be used in consultation with your regulator or a professional statistician. Note: California users should ordinarily use the state-approved formula to compute alpha levels. When CA Standards is selected, a dialog box opens, and the Modified Alpha option is calculated automatically upon entry of the relevant site information.
Use Ladder of Powers: When selected, the system will use a variety of power transformations, in conjunction with the selected normality test and the Use Best W Stat checkbox described below, in an attempt to normalize the distribution for use with the parametric prediction limit. The following transformations are made in the order shown: x, x1/2, x2, x1/3, x3, ln(x), x4, x5, x6. Note: in Sanitas (as in the Guidance documents Sanitas is derived from) the terms log and ln are interchangeable.
Natural Log or No Transformation: This option will attempt only a ln(x) transformation in conjunction with the selected normality test and in conjunction with the Use Best W Stat option below. If the normality test fails on the raw and the ln(x) transformed data, the non-parametric prediction limit will be used.
Never Transform: If the normality test fails on the raw data, the non-parametric prediction limit will be used.
Use Specific Transformation: This option allows the user to specify a transformation which will be run regardless of the distribution of the data set. The normality test can still optionally be run, with the results displayed on the report.
Use best W-Stat: When selected, the system will choose the data transformation that best normalizes the data as determined by the highest normality W-statistic (See the statistical write-up for W-Statistic test description). If this option is deselected, the program will use the first (least extreme) data transformation that normalizes the data.
Plot Transformed Values: When selected, the report will show values on a scale that reflects the power transformation, if any, that was used on the data. Otherwise the scale will reflect the original units of measurement regardless of any power transformation used.
Stop if Background Trend Detected Set this option to cause Sanitas to stop (not produce a report) if the intrawell background data shows a significant trend using Mann-Kendall trend test at the specified alpha level.
Plot Background Data: Plots the raw intrawell background data in addition to the compliance data and limits. (By default interwell background wells are not plotted on the report, but this too is an option, set in the Sanitas9.ini file as PREDICTION_LIMIT/ PLOT_INTERWELL_BACKGROUND.)
Override Standard Deviation and/or Degrees of Freedom:The Unified Guidance document describes a process for enlarging the degrees of freedom via a pooled estimate of the standard deviation. These two options were provided to allow for this kind of flexibility. Any valid degrees of freedom and/or standard deviation may be explicitly set. These options apply to both inter- and intra-well prediction limits. In general these options should only be used by knowledgeable users, and/or under the supervision of a professional statistician.
Override Kappa:This option applies to both inter- and intra-well prediction limits. In general this option should only be used by knowledgeable users, and/or under the supervision of a professional statistician.
Automatically Remove Background Outliers: When selected, the system will run the outlier test specified on the Other Tests tab on the background data and remove any statistical outliers found prior to constructing the report (whether parametric or non-parametric, inter- or intrawell). Note that individual values can be flagged to prevent their being removed in this process. See the OUTLIER section of the sanitas9.ini file.
Two-Tailed Test Mode: When selected, the system will use two-tailed tests (both parametric and non-parametric). Note that by default two-tailed mode is automatically used for pH and alkalinity, so in some cases explicitly checking this checkbox may be unnecessary.
Show Deselected Data: When checked, Sanitas will add points on the report to indicate values that are present but “deselected”. This can include points that are manually unchecked in the current View, and also values that are deselected by flag (see option on the Data tab). When this option is checked, the accompanying drop down list is enabled, containing a choice of visibility options for these points.
Non-Parametric Limit = (UG Standards): This option is disabled under certain configurations. The Unified Guidance allows for the use of the second highest background value as the non-parametric prediction limit, as opposed to the default use of the highest background value. This option controls which background value (n or n-1) is used as the non-parametric prediction limit. Alpha levels will vary accordingly.
Non-Parametric Limit when 100% Non-Detects: Ordinarily a non-parametric prediction limit is set at the highest (or second highest if selected) background value. However, when all of the background values are censored, it is sometimes recommended that either the most recent detection limit or the most recent PQL, if available, be used to establish the limit. These alternative methods can be specified here. As a reminder, when Sanitas refers to the PQL, it means the syntax <MDL+PQL[+Observed]. See the data file format sections for more details on input formats for trace values.